NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Grey-Box Machine Learning Prediction of Parallel Application Scaling

Alasandagutti, Akhil; Bridges, Patrick G; Estrada, Trilce (December 2025, Proceedings of the 32st IEEE International Conference on High Performance Computing, Data, and Analytics)

Accurate prediction of parallel application performance in HPC systems is essential for efficient resource allocation and system design. Classical performance models estimate of speedup based on theoretical assumptions, but their applicability is limited by parameter estimation, data acquisition, and real-world system issues such as latency and network congestion. This paper describes performance prediction using classical performance models boosted by a trainable machine learning framework. Domain-informed machine-learning models estimate the overhead of an application for a given problem size and resource configuration as a coefficient of the estimated speedup provided by performance laws. We evaluate this approach on two HPC mini-applications and two full applications with varying patterns of computation and communication and also evaluate the prediction accuracy on runs with varying processors-per-node configurations. Our results show that this method significantly improves the accuracy of performance predictions over standard analytical models and black-box regressors, while remaining robust even with limited training data.
more » « less
Free, publicly-accessible full text available December 17, 2026
Increasing the Efficiency of Ensemble Molecular Dynamics Simulations with Termination of Unproductive Trajectories Identified at Runtime

https://doi.org/10.1021/acs.jpca.4c05182

Marquez, Jack; Cuendet, Michel_A; Caino-Lores, Silvina; Estrada, Trilce; Deelman, Ewa; Weinstein, Harel; Taufer, Michela (February 2025, The Journal of Physical Chemistry A)
Scaling Laws for the Workload Throughput of Emerging Heterogeneous Clusters

https://doi.org/10.1109/CCGRID64434.2025.00025

Alasandagutti, Akhil; Suetterlein, Joshua; Firoz, Jesun; Young, Stephen; Manzano, Joseph; Stewart, Jason R; Bridges, Patrick G; Estrada, Trilce; Barker, Kevin (May 2025, IEEE)

Not AvailableNext-generation HPC clusters are evolving into highly heterogeneous systems that integrate traditional computing resources with emerging accelerator technologies such as quantum processors, neuromorphic units, dataflow architectures, and specialized AI accelerators within a unified infrastructure. These advanced systems enable workloads to dynamically utilize different accelerators during various computation phases, creating complex execution patterns. The performance of the workloads can therefore be impacted by many factors, including how the accelerators are shared, their utilization, and their placement within the system. Moreover, effects such as the system and network state due to the overall system load can significantly impact the job completion rate. Understanding, identifying, and quantifying the impact of the most critical factors (e.g., the number of allocated accelerators) will help decide the investment decisions for accelerator acquisition and deployment that can improve the overall system throughput. This paper extensively studies these complex interactions among advanced accelerators within an HPC cluster and various workloads. We introduce a novel analytical model which predicts the speedup of a workload given an accelerator/system configuration. This model can be used to quantify the effect of augmenting additional accelerators on job performance running on an HPC cluster. We validate the model using both simulated and real environments.
more » « less
Free, publicly-accessible full text available May 19, 2026
nidiamcl/stream-graph: Initial Public Release

https://doi.org/10.5281/zenodo.10631952

Vaquera, Nidia; Estrada, Trilce; Jafari Khouzani, Soheila; Bridges, Patrick G. (February 2024, Zenodo)

This is the initial public release of the NSF funded PASCAL-G algorithm, which includes the MPI implementation we developed.
more » « less
sjafari2/K8sKafkaPipeline: K8s-Kafka-DataStreaming-Pipeline

https://doi.org/10.5281/zenodo.10631950

Jafari Khouzani, Soheila; Vaquera, Nidia; Estrada, Trilce; Bridges, Patrick G. (January 2024, Zenodo)

This is the initial public release for a funded project by NFS which developes the Kafka Pipeline orchestrated in Kubernetes to run a data streamiong in a real-time fashion.
more » « less
DRUM: A Real Time Detector for Regime Shifts in Data Streams via an Unsupervised, Multivariate Framework

Bashir, Adnan; Estrada, Trilce (August 2023, Lecture notes in computer science)

In this work we present DRUM, an unsupervised approach that is based on statistical properties of multivariate data streams to identify regime shifts in real time. DRUM processes streams in small chunks, learns their statistical properties, and makes generalizations as time goes by. We show how this straightforward approach requires minimal computation and reaches state of the art accuracy, making it ideal for embedded and cyber physical systems.
more » « less
Full Text Available
Online Boosted Gaussian Learners for In-Situ Detection and Characterization of Protein Folding States in Molecular Dynamics Simulations

https://doi.org/10.1109/e-Science58273.2023.10254895

Sahni, Harshita; Carrillo-Cabada, Hector; Kots, Ekaterina; Caino-Lores, Silvina; Marquez, Jack; Deelman, Ewa; Cuendet, Michel; Weinstein, Harel; Taufer, Michela; Estrada, Trilce (October 2023, Proceedings of the 19th IEEE International Conference on e-Science (eScience))

Full Text Available
Runtime Steering of Molecular Dynamics Simulations Through In Situ Analysis and Annotation of Collective Variables

https://doi.org/10.1145/3592979.3593420

Caino-Lores, Silvina; Cuendet, Michel; Marquez, Jack; Kots, Ekaterina; Estrada, Trilce; Deelman, Ewa; Weinstein, Harel; Taufer, Michela (June 2023, ACM)

Full Text Available
NSF/IEEE-TCPP Curriculum on Parallel and Distributed Computing for Undergraduates - Version II - Big Data, Energy, and Distributed Computing

https://doi.org/10.1145/3545947.3569594

Prasad, Sushil; Weems, Charles; Sussman, Alan; Gupta, Anshul; Estrada, Trilce; Vaidyanathan, Ramachandran; Ghafoor, Sheikh; Kant, Krishna; Stunkel, Craig (March 2022, ACM)

This special session will report on the updated NSF/IEEE-TCPP Curriculum on Parallel and Distributed Computing released in Nov 2020 by the Center for Parallel and Distributed Computing Curricu- lum Development and Educational Resources (CDER). The purpose of the special session is to obtain SIGCSE community feedback on this curriculum in a highly interactive manner employing the hybrid modality and supported by a full-time CDER booth for the duration of SIGCSE. In this era of big data, cloud, and multi- and many-core systems, it is essential that the computer science (CS) and computer engineering (CE) graduates have basic skills in par- allel and distributed computing (PDC). The topics are primarily organized into the areas of architecture, programming, and algo- rithms topics. A set of pervasive concepts that percolate across area boundaries are also identified. Version 1 of this curriculum was released in December 2012. That curriculum guideline has over 140 early adopter institutions worldwide and has been incorpo- rated into the 2013 ACM/IEEE Computer Science curricula. This Version-II represents a major revision. The updates have focused on enhancing coverage related to the topical aspects of Big Data, Energy, and Distributed Computing. The session will also report on related CDER activities including a workshop series on a PDC institute conceptualization, developing a CE-oriented version of the curriculum, and identifying a minimal set of PDC topics aligned with ABET’s exposure-level PDC require- ments. The interested SIGCSE audience includes educators, authors,publishers, curriculum committee members, department chairs and administrators, professional societies, and the computing industry.
more » « less
Girasol, a sky imaging and global solar irradiance dataset

https://doi.org/10.1016/j.dib.2021.106914

Terrén-Serrano, Guillermo; Bashir, Adnan; Estrada, Trilce; Martínez-Ramón, Manel (April 2021, Data in Brief)
null (Ed.)
Full Text Available

« Prev Next »

Search for: All records